# 5-Stage RISC-V RV32I Datapath



## Pipelined CPU Design

Now, we will optimize a single cycle CPU using pipelining. Pipelining is a powerful logic design method to reduce the clock time and improve the throughput, even though it increases the latency of an individual task and adds additional logic. In a pipelined CPU, multiple instructions are overlapped in execution. This is a good example of parallelism, which is one of the great ideas in computer architecture. To obtain a pipelined CPU, we will take the following steps.

## **Step 1: Pipeline Registers**

Pipelining starts from adding pipelining registers by dividing a large combinational logic. We have already chopped a single cycle CPU into five stages, and thus, will add pipeline registers between two stages.

## **Step 2: Performance Analysis**

A great advantage of pipelining is the performance improvement with a shorter clock time. We will use the same timing parameters as those in the previous discussion.

| Element   | Register<br>clk-to-q             | Register<br>Setup | MUX       | ALU          | Mem<br>Read      | Mem<br>Write      | RegFile<br>Read | RegFile<br>Setup |
|-----------|----------------------------------|-------------------|-----------|--------------|------------------|-------------------|-----------------|------------------|
| Parameter | $t_{\rm clk\text{-}to\text{-}q}$ | $t_{ m setup}$    | $t_{mux}$ | $t_{ m ALU}$ | $t_{ m MEMread}$ | $t_{ m MEMwrite}$ | $t_{RFread}$    | $T_{RFsetup}$    |
| Delay(ps) | 30                               | 20                | 25        | 200          | 250              | 200               | 150             | 20               |

Q1. What was the clock time and frequency of a single cycle CPU?

Q2. What is the clock time and frequency of a pipelined CPU?

Q3. What is the speed-up? Why is it less than five?

## Step 3: Dealing with Hazards

The performance improvement comes at a cost. Pipelining introduces pipeline hazards we have to overcome.

#### Structural Hazard

Structural hazards occur when more than one instruction use the same resource at the same time.

- **Register File:** One instruction reads from the register file while another writes to it. We can solve this by having separate read and write ports and writing to the register file at the falling edge of the clock.
- **Memory:** The memory is accessed not only for the instruction but also for the data. Separate caches for instructions and data solve this hazard.

## **Data Hazard and Forwarding**

Data hazards occur due to data dependencies among instructions. Forwarding can solve many data hazards.

Q1. Spot the data dependencies in the code below and figure out how forwarding can resolve data hazards.

| Instruction     | C0 | C1  | C2  | C3  | C4  | C5  | C6 |
|-----------------|----|-----|-----|-----|-----|-----|----|
| addi t0, s0, -1 | IF | REG | EX  | MEM | WB  |     |    |
| and s2, t0, a0  |    | IF  | REG | EX  | MEM | WB  |    |
| sw s0, 100(t0)  |    |     | IF  | REG | EX  | MEM | WB |

Q2. In general, under what conditions will an EX stage need to take in forwarded inputs from previous instructions? Where should those inputs come from in regards to the current cycle? Assume you have the signals ALUout(n), rt(n), rs(n), regWrite(n), and regDst(n), where n is 0 for the signal of the current instruction being executed by the EX stage, -1 for the previous, etc.

#### Data Hazard and Stall

Q1. Spot the data dependencies in the code below and figure out why forwarding cannot resolve this hazard.

| Instruction      | C0 | C1  | C2  | C3  | C4  | C5 |
|------------------|----|-----|-----|-----|-----|----|
| lw t0, 20(s0)    | IF | REG | EX  | MEM | WB  |    |
| addiu t1, t0, t0 |    | IF  | REG | EX  | MEM | WB |

#### Q2. What can we do to solve this hazard?

#### **Control Hazard and Prediction**

Control hazards occur due to jumps and branches. We could stall the pipeline, but this decreases performance.

Q1. What options are there when we encounter a control hazard?

## **Putting It All Together**

| Instruction          | C0 | C1  | C2  | C3  | C4  | C5  | C6  |     |    |  |
|----------------------|----|-----|-----|-----|-----|-----|-----|-----|----|--|
| 1. addi t0, s0, -1   | IF | REG | EX  | MEM | WB  |     |     |     |    |  |
| 2. and s2, t0, a0    |    | IF  | REG | EX  | MEM | WB  |     |     |    |  |
| 3. sw s2, 100(t0)    |    |     | IF  | REG | EX  | MEM | WB  |     |    |  |
| 4. beq s0, s3, label |    |     |     | IF  | REG | EX  | MEM | WB  |    |  |
| 5. addi t2, x0, x0   |    |     |     |     | IF  | REG | EX  | MEM | WB |  |

Given the RISC-V code above and a pipelined CPU with no forwarding, how many hazards would there be? What types are each hazard? Consider all possible hazards from all pairs of instructions.

How many stalls would there need to be in order to fix the data hazard(s)? What about the control hazard(s)?